Which resemblance is useful to predict phrase boundary rise labels for Japanese expressive text-to-speech synthesis, numerically-expressed stylistic or distribution-based semantic?

نویسندگان

Hideharu Nakajima

Hideyuki Mizuno

Osamu Yoshioka

Satoshi Takahashi

چکیده

To establish Expressive Text-to-speech synthesis, current research studies both the processing of input text and the rendering of natural expressive speech. Focusing on the former as a front-end task in the production of synthetic speech, this paper investigates a novel feature for predicting phrase boundary tone labels which transcribe local fundamental frequency (F0) changes frequently appearing at phrase end positions in expressive speech. To this end, we examined a kind of distributionbased semantic features consisting of i) word surface strings, ii) their part-of-speech tags taken from a phrase and iii) the pause existence/non-existence at the final position of the phrase, which are different from conventional numerically-expressed stylistic features such as positions and lengths and distances of the phrase. Through experiments on Japanese expressive speech such as conversational speech and advertisement speech, we confirmed that the proposed features attain performance equal to or better than conventional features. These results suggest that the distribution-based semantic features might be useful to predict phrase boundary rise labels for conversational speech and might be useful equal to conventional numericallyexpressed stylistic feature for advertisement speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accent Sandhi Estimation of Tokyo Dialect of Japanese Using Conditional Random Fields

When synthesizing speech from Japanese text, correct assignment of accent nuclei for input text with arbitrary contents is indispensable in obtaining naturally-sounding synthetic speech. A phenomenon called accent sandhi occurs in utterances of Japanese; when a word is uttered in a sentence, its accent nucleus may change depending on the contexts of preceding/succeeding words. This paper descri...

متن کامل

Emphasized Accent Phrase Prediction from Text for Advertisement Text-To-Speech Synthesis

Realizing expressive text-to-speech synthesis needs both text processing and the rendering of natural expressive speech. This paper focuses on the former as a front-end task in the production of synthetic speech, and investigates a novel method for predicting emphasized accent phrases from advertisement text information. For this purpose, we examine features that can be accurately extracted by ...

متن کامل

Automatic labeling of Japanese prosody using j-toBI style description

Speech corpora with prosodic labels are getting more and more important not only for speech synthesis but also for discourse modeling. A widely used labeling system for Japanese prosody, J-ToBI, however, is insufficient for applications like discourse modeling and it even lacks an accurate method for automatic labeling. In this paper, we propose an automatic labeling method for J-ToBI style des...

متن کامل

Context labels based on "bunsetsu" for HMM-based speech synthesis of Japanese

A new set of context labels was developed for HMM-based speech synthesis of Japanese. The conventional labels include those directly related to sentence length, such as number of “mora” and order of breath group in a sentence. When reading a sentence, it is unlikely that we count its total length before utterance. Also a set of increased number of labels is required to handle sentences with var...

متن کامل

مدل ترجمه عبارت-مرزی با استفاده از برچسب‌های کم‌عمق نحوی

Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Which resemblance is useful to predict phrase boundary rise labels for Japanese expressive text-to-speech synthesis, numerically-expressed stylistic or distribution-based semantic?

نویسندگان

چکیده

منابع مشابه

Accent Sandhi Estimation of Tokyo Dialect of Japanese Using Conditional Random Fields

Emphasized Accent Phrase Prediction from Text for Advertisement Text-To-Speech Synthesis

Automatic labeling of Japanese prosody using j-toBI style description

Context labels based on "bunsetsu" for HMM-based speech synthesis of Japanese

مدل ترجمه عبارت-مرزی با استفاده از برچسب‌های کم‌عمق نحوی

عنوان ژورنال:

اشتراک گذاری